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Abstract. Airborne laser scanning data are increasingly availablefor vari- 
ous applications including digital elevation modeling. One of the biggest 
challenges for its successful use is the varying data density caused by land 
cover properties. Areas with dense canopy cover have much lower data den- 
sity than the open areas. This varying data density may pose a problem for 
some spatial interpolation methods, such as Regularized Spline with Ten- 
sion. In thisstudy, we propose a methodology to eliminate the interpolation 
artifacts caused by the varying data density by a proper selection of data 
points entering the interpolation process. Using the application example 
from the Slovak karst area, we demonstrated the applicability of the pre- 
sented approach. 
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1. Introduction 

The main use of the airborne laser scanning (ALS) data is in producing 
highly detailed digital models of surfaces sampled by the laser beam which 
can be reflected (returned) from several surface levels and recorded (Wehr 
& Lohr, 1999). Spatial density of the bare ground returns varies with the 
land cover properties as they determine the capacity of the laser to reach 
the ground through the canopy. The varyi ng density may pose a problem for 
some interpolation methods frequently used to compute a grid- based digi- 
tal terrain model (DTM). Numerous artifacts such as abrupt spikes or 
squared terraces may occur in the contact zones of high and low data densi- 
ty. Hofierka & Cebecauer (2007) suggested some solutions albeit not focus- 
ing on the specifics of the ALS data. I n contrast to traditional data sources, 
the ALS data comprise a huge number of points entering the interpolation 
with very high computer processing power demands. There are studies pro- 



posing reduction of the input data while the output surface is satisfactory 
(Liu 2008). However, if applied globally, points over smaller landformscan 
be lost. The aim of the presented research is to tackle the issue of varying 
data density by a selective reduction of the input data and appropriate pa- 
rameterisation of the interpolation method to preserve important geo- 
morphic features in a karst terrai n (e.g. dolines). 



2. Methods and Data 

2.1. Spatial interpolation using regularized spline with tension 

The Regularized Spline with Tension (RST) is an interpolation function 
belonging to the group of radial basis functions with two imposed interpola- 
tion conditions: the function should pass through the data points and, at 
the same time, it should be as smooth as possible. These conditions are ex- 
pressed by the mi ni mizati on of devi ati ons from the measured poi nts and its 
smoothness semi norm (M itas and M itasova 1999): 
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where/(x) is the RST function, p j are the values measured at discrete 
points Xj = {x j ,y j ), j = l,...,N within a region of a 2-dimensional space, 
Wj,w are positive weighting factors and /(/) is the smoothness semi- 
norm. For , the function f(x) passes exactly through the measured points. 
The general solution of the minimization problem given by equation (1) can 
be expressed as a sum of two components (M itasova & M itas, 1993): 

/(x)=r(x)+Xv(x,x i ) (2) 
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where r(x) is a "trend" function and r{x,Xj) is a radial basisfunction with 
an expl i cit form dependi ng on the choi ce of the smoothness semi norm /(/) . 
The smoothness semi norm has been designed to include several useful in- 
terpolation properties such as an explicit form, multivariate formulation, 
smooth derivatives of higher orders, variational freedom through tension 
and anisotropy. 

I n a bivariate (2-D) formulation, the RST function generally defined by (2) 
has the f ol I owi ng expl i ci t form ( M i tasova et al ., 1995) : 
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where p = (<pr if , r 2 = (x - x } ) 2 + (y - yj ) 2 is the squared distance, 

C E =0.577215 is the Euler constant, E x () is the exponential integral func- 
tion, and^> is a generalized tension parameter which provides the control 
over the influence of derivatives of certain order on the resulting function. 
The function is implemented in GRASS GIS as the v.surf.rst command 
(Neteler and Mitasova, 2008). In this command, the interpolation process 
i s control I ed by the f ol I owi ng set of parameters: 

- tension <p , 

- smoothing w, 

- anisotropy (0,s ), where , s are the rotation and scale, respectively, 

- minimum and maximum distances between points. 

Tension <p , smoothing w and anisotropy ( 9,s ) are the key internal parame- 
ters controlling the character of the resulting surface. The tension parame- 
ter controls the behavior of the resulting surface from a thin membrane to a 
stiff steel plate. The RST method is scale dependent and the tension works 
as a rescaling parameter (Neteler & Mitasova 2008). A high tension "in- 
creases the distances between the points" and reduces the range of impact 
of each point, low tension "decreases the distance" and the points influence 
each other over a longer range. The tension parameter plays a key role in 
areas with a steep change of modeled phenomenon where overshoots and 
undershoots of the interpolated surface may occur. A surface interpolated 
with a high tension behaves like a membrane (rubber sheet stretched over 
the data points) with peak or pit in each given point and everywhere else 
the surface goes rapidly to trend. A surface with a very low tension behaves 
like a stiff steel plate and overshoots can appear in areas with rapid change 
of gradient (Hofierka, 2005). Minimum and maximum distances between 
points control the number of points that are actually used in interpolation 
after reading the input data. However, this parameter internally influences 
the effect of the tension, because the tension works as a distance- scaling 
factor. Therefore, the tension can beset with or without normalization of 
data. The data density does not affect the normalized (rescaled) tension 
parameter. 

Using the smoothing parameter w, the RST behaves like an approximation 
function, i.e. the resulting surface does not pass through the given points, 
but approximates the input values. This parameter is useful in modeling 
noisy data, where higher smoothing can filter out the noise, or alternatively, 
when the phenomenon needs to be model ed at a I ower I evel of detai I . 



The anisotropy parameters {0,s) can be used for interpolation of aniso- 
tropic data. The orientation of the perpendicular axes characterizing the 
anisotropy is defined by the rotation parameter and the scaling ratio of 
the perpendicular axes (a rati oof axes sizes) isdefined by the scale parame- 
ter s .These parameters scale distances (i.e., the value of tension) in 2 per- 
pendicular directions that should fit the spatial pattern of the anisotropic 
phenomenon. 

The number of points used for interpolation by the v.surf.rst module in 
GRASS GIS is controlled by 4 parameters: dmin - minimum distance be- 
tween points (to remove almost identical points), dmax - maximum dis- 
tance between points on isoline (to insert additional points), segmax defin- 
ing the maximum number points in the interpolation segment and npmin - 
minimum number of points used for interpolation in a segment (Neteler & 
Mitasova, 2008). The segmentation procedure of the module divides the 
whole area into a set of overlapping segments to ensure a smooth connec- 
ti on of the segments to the f i nal surface. 

These parameters can be selected empirically, based on the knowledge of 
the modeled phenomenon, or automatically, by minimization of the predic- 
tive error estimated, for example, by a cross-validation procedure (Hofier- 
ka, 2005, H of ierka et al., 2007). 

Evaluating the interpolation quality 

Evaluation of the interpolation accuracy requires comparison of the actual 
(measured) data values with the interpolated (estimated) values. Such a 
comparison leads to errors (or residuals) at the given points (Weng, 2006). 
To quantify the residuals various statistical measures are used, such as the 
mean absolute error (MAE), the mean error (ME), and the root mean 
squared error (RMSE). These error measures are defined as follows: 
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where z*i is the interpolated (estimated) value at location i and z is the 
actual (measured) value. 



In case of exact interpolators the residuals between the interpolated and 
measured values are rather low because this is an inherent feature of these 
methods. Still, in case of very rapid changes in terrain this comparison pro- 
vides valuable information about the quality of interpolation. However, this 
comparison says very little about the quality of interpolation in areas be- 
tween the measured points. To minimize the drawbacks of this direct com- 
parison and provide an assessment of the predictive capabilities of the in- 
terpolation method in area without data, a specific evaluation method 
called cross-validation (CV) or jack-knife is widely used. This method is 
based on removing one input data point at a time, performing the interpola- 
tion for the location of the removed point using the remaining samples and 
calculating the residual between the actual value of the removed data point 
and its estimate. The procedure is repeated until every sample has been, in 
turn, removed. This form of CV is also known as the "leave-one-out" meth- 
od (H of ierkaetal., 2002). 

The CV procedure is especially suitable for relatively dense data sets, since 
removing points from already under- sampled areas can lead to misrepre- 
sentation of the surface to be interpolated (the surface is smoothed). The 
minimum statistical errors calculated by CV can be used to find the opti- 
mum interpolation parameters (Mitasova et al., 1995), (Hofierka et al., 
2002). However, Hutchinson (1998) has found that CV does not always 
represent a reliable estimate of the model error, especially when a short- 
range correlation in data is present. Hofierka (2005) has shown the limited 
applicability of CV to the optimization of RST parameters in areas where 
input data are not sufficiently sampled especially in anomaly and local fluc- 
tuation areas with a higher need for representative sampling. Also, it has 
been shown that the RST parameters are flexible enough to produce inter- 
polation results that reflect the behavior of the modeled phenomenon even 
for less dense data sets. Another serious drawback of the CV method is its 
high computational demand in case of largedatasets. Interpolation must be 
iteratively repeated for every removed point and every combination of in- 
terpolation parameters which is also the case of ALS data. 

Hofierka et al. (2007) have suggested that the evaluation of interpolation 
accuracy can be also assessed using an evaluation dataset containing data 
not used in interpolation. For each evaluation point the error between actu- 
al and interpolated value is calculated and the overall accuracy is tested. 
This evaluation dataset can betaken from the i ndependent measurement of 
points or the original dataset using points selected by a random generator. 
The sel ected poi nts are then removed from the i nterpol ati on dataset. 



2.2. Study area 

The ALS data used in this research represent a 2 by 2 km portion of the Slo- 
vak Karst, East Slovakia. The area is mostly wooded with occasional mead- 
ows and scrubs comprising a plateau dissected by a deep canyon and a few 
occasional dolines (Figure 1). The altitude ranges between 540 to 704 m 
a.s.l. The data were acquired in the leaf-on conditions in 2009 and supplied 
as a filtered point cloud. The supplier claims the vertical root mean square 
error (RMSE) of 23 cm. The sample area contains 217 984 points classified 
as bare ground of spacing varying between 1-80 m with the average data 
density around 0.054 point/ m 2 (Figure 2). Spatial distribution of points is 
very uneven with a higher data density associated with open (grass) areas 
with the average data density around 0.16 point/ m 2 and the lowest data 
densityin compact forest areas with dense canopy cover (0.03 point/ m 2 ). 




Figure L Position of the study area. 




Figure 2. Orthophotomap of the study area with land cover classes (1- forest with 
dense canopy cover, 2 - forest with sparse canopy cover, 3 - grass with sparse trees, 
4 - grass and shrub, 5 - grass). 



3. Results and Discussion 

The v.surf.rst interpolation module implemented in GRASS GIS version 
6.4.2 has been used to compute the DTM of the study area. The spati al reso- 
lution of the grid- based DTM was set to 2 m which should be sufficient to 
capture most of the spatial variation in elevation including specific karst 
relief features, dolines, with the mean size in diameter of a few tens of me- 
ters. 



The initial DTM was computed using the tension set to 20 and the rest of 
parameters set to default values (Table 1). The resulting surface showed a 
few interpolation artifacts such as visible segments caused by the uneven 
data density and segmentation procedure implemented in the module. The 
solution recommended by the GRASS manual page is to increase the num- 
ber of points taken for the segment computation with larger overlapping 
areas around the segments usi ng the npmi n parameter. The npmi n parame- 
ter was set to 400 and dmin to 2 meters (Table 1). H owever, increasing the 
number of points using the npmin parameter has a limited usability for 
larger datasets because this substantially increases the computational time 
(approximately with a power of three). Despite it helped to improve the 
quality of interpolation, some interpolation artifacts in the resulting surface 
were still present. 

Hofierka&Cebecauer (2007) have suggested that the interpolation artifacts 
caused by the segmentation procedure can be also minimized by a more 
even distribution of data points. They used contour data from topographic 
maps and photogrammetric imagery. The initial DTM with interpolation 
artifacts was sampled to include data points representing the surface in 
areas with no input data points. Whilethis approach may lead to satisfacto- 
ry results also for the ALS data, the problem of efficient computation of 
large datasets consisting of millions of data points is not still fully ad- 
dressed. 

Our analysis showed that the ALS data density is usually very high on areas 
with no canopy cover (Figure 2 vs. Figure 3a). Depending on the purpose of 
representing specific relief features we can safely reduce the number of data 
points taken by the v.surf.rst module using the dmin parameter even more 
than the limit of 2 meters (for example, with dmin=6 meters) without losing 
too much information and still preserving the necessary accuracy of the 
resulting surface needed to represent specific karst landforms. However, to 
mi ni mi ze the possi bl e I oss of i nformati on i n areas with sparse data we have 
used a selective reduction of data points with the chosen limit of 6 meters. 
The selection of points is based on the interpolation error registered for 
each point during interpolation with the dmin parameter set to 2 meters. 
All points with an interpolation error lower than 0.25 m were removed and 
only points with a higher error were kept during the computation with the 
dmi n parameter set to 6 meters (Table 1). This assumes that the data is suf- 
ficiently reliable and accurate and the interpolation error is caused mainly 
by abrupt changes in the resulting surface. To preserve the character of the 
interpolated surface and compare the interpolation results, we used the un- 
normalized tension parameter of the v.surf.rst module (-t flag) set to 185 
(which correspondents to a rescaled tension of 20). The main benefit of the 
suggested approach is a reduction of interpolation artifacts arising from the 



uneven spatial distribution of the input points and much higher speed of 
computation what is so important for massive datasets produced by the 
laser scanning. 




Figure 3. 1 nput ALS points. A - entire study area. B - detail of the area in the red 
square on A. C - reduced dataset with a mi ni mal spaci ng of i nput data set to 2 m, D 
- reduced dataset with a minimal spacing of input data set to 6 m. 

The compari son of i nterpol ati on accuraci es achi eved by the suggested set of 
parameters is presented in Table land Table 2. While Table 1 presents the 
interpolation accuracy at given points, Table 2 presents the predictive error 
of the interpolation method with various parameterisation settings using an 
evaluation set of 1000 randomly selected points withheld from further in- 
terpolation. The comparison shows clearly the drawbacks of evaluation of 
interpolation accuracies at given points. The best results by RMSE were 
achieved for the setting with dmin=6 meters (RMSE =0.1432 m). However, 



Table 2 shows that the best results were achieved for dmi n*=6 meters with 
additional points with the highest interpolation errors identified during the 
computation with dmin set to 2 meters. I n contrast, the best result identi- 
fied in Table listhe worst in Table2. 

The overall interpolation results of the RST method and v.surf.rst module 
are very good because in all cases the RMSE is very close to the declared 
overall accuracy of the ALS data (RMSE =0.23 m). However, theinterpola- 
tion artifacts were clearly visible in theparameterisation using npmin=300. 
Almost complete elimination of interpolation artifacts can be seen using 
npmin=400 and further reduction of point using dmin=6 meters while still 
preservi ng the mapped geomorphi c features (Fi gure 4) . 

The lowest RMSE was achieved using the suggested approach of selective 
data points reduction (RMSE =0.2195 m). Another benefit of the proposed 
method is also in the substantial increase in the speed of computation (Ta- 
ble!). 





NP 


RE 


ME 


MAE 


RMSE 


CT 


tension=20, dmin=1.0, npmin=300 


183 418 


17.23 


-0.000028 


0.1068 


0.1596 


65 
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tension*=185, dmin=6.0, npmin=400 
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tension*=185, dmin*=6.0, npmin=400 


62 640 


13.98 


0.000015 


0.1456 
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Table L I interpolation accuracy at given points (tension* - um-normalized tension, 
dmin* - selective dmin, NP - number of points, RE - range of errors, ME - mean 
error, MAE - mean absolute error, RMSE - root mean square error, CT - computa- 
tional time in minutes). 





ME 


MAE 


RMSE 


tension*=185, dmin=2.0, npmin=400 


-0.02125 


0.1676 


0.2447 


tension*=185, dmin=6.0, npmin=400 


-0.02163 


0.1782 


0.2592 


tension*=185, dmin*=6.0, npmin=400 


-0.01137 


0.1570 


0.2195 



Table 2. Interpolation accuracy using an evaluation dataset of 1000 randomly 
selected points. 




Figure 4. Resulting DTM . Specific karst geomorphic features remained preserved 
in the interpolated surface. 



4. Conclusion 

This paper explores the methodology of efficient processing of ALS elevation 
data with a variable spatial density using the RST interpolation method. We 
demonstrated that the interpolation artifacts can be minimized by controlling 
the number of data points used in the interpolation process based on the 
minimal distances between the points as well as the selection of the most 
i important poi nts deri ved from the anal ysi s of i nterpol ati on errors. 
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2008/2.]/01-SORO 26220120007, WGS 63/12-13, WGS-PF-2012-62 
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